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Abstract We describe the implementation of the PhotoZ code in the frame- 
work of the Astro- WISE package and as part of the Photometric Classification 
Server of the PanSTARRS pipeline. Both systems allow the automatic mea- 
surement of photometric redshifts for the millions of objects being observed 
in the PanSTARRS project or expected to be observed by future surveys like 
KIDS, DES or EUCLID. 
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1 Introduction 

Since the completion of the Sloan Digital Sky Survey (SDSS) pQ optical as- 
tronomy has moved on from the detailed studies of single objects to a phase 
where catalogues with millions of entries can be produced. This has allowed 
for detailed statistical studies of entire populations, as well as searches for 
extremely rare objects. Consequently, astronomers are forced to update their 
approach to data analysis and to embed their codes in database-supported ap- 
plications that support efficient automatic procedures and easy administration 
of the analysis processes. One such case is the measurement of photometric 
redshifts for the hundreds of millions of galaxies that ongoing or future optical 
and near infrared photometric surveys will deliver. 
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We are directly involved in three large scale imaging surveys. The Panoramic 
Survey Telescope and Rapid Response System 1 (PanSTARRSl, see [2]) project 
started regular operations in May 2010 and is producing a 5 band (grizy) sur- 
vey of 3/4 of the sky that at the end of the forseen 3 years of observations, 
will be ~s 1 mag deeper than SDSS. Approximately two hundred million galax- 
ies, a similar number of stars, about a million quasars and ~ 7000 Type la 
Supernovae will be detected. VIKING (VISTA Kilo-Degree Infrared Galaxy 
Survey 0) is a near-infrared 4 band (ZJHK) survey of 1500 square degrees of 
extragalactic sky that started in December 2009 at the VISTA telescope. This 
will be complemented in 5 optical bands (ubgri) by the Kilo Degree Survey 
(KIDS El) at the VST telescope, to start in October 2011. Finally, the Dark 
Energy Survey (DES 0) will image 5000 square degrees around the southern 
galactic pole in 4 optical bands (bgri) at the CTIO telescope. Looking to the 
future, we are participating in the EUCLID bid. If approved, the satellite 
will image 20000 square degrees of the extragalactic sky in the optical and 
NIR channels, providing unprecedented deep photometry for many millions of 
galaxies. 

The science driving these projects ranges from Baryonic Acoustic Oscilla- 
tions and growth of structure, to weak shear, galaxy-galaxy lensing and lensing 
tomography. All of them rely on the determination of accurate photometric 
redshifts for extremely large numbers of galaxies. Further science goals, like 
the detection of high redshift quasars and galaxies, the discovery of very cool 
stars, or the study of galaxy evolution with cosmic time will also profit from 
the availability of good photometric redshift and star/galaxy photometric clas- 
sification. Therefore, in the last few years we have designed and implemented 
schemes to derive and keep organized photometric redshifts, probability dis- 
tributions and star/galaxy classification for extremely large datasets. 

Here we describe two aspects of these efforts; the PhotoZ implementation 
for Astro- WISE and the PanSTARRSl Photometric Classification Server. The 
structure of the paper is as follows. Sect. [5] considers the algorithm at the 
core of our implementations and its recent scientific use. Sect. [3] discusses the 
implementation of the code for large data sets. In Sect. l3.ll we present its Astro- 
WISE incarnation and give examples of its use and evaluate its performances 
in Sect. 13.21 Sect. 13.31 is dedicated to the implementation of the code for the 
PanSTARRSl survey. We draw our conclusions in Sect.0J 

2 The PhotoZ code: algorithm and science applications 

In the last decade several efficient codes for the determination of photometric 
redshifts have been developed and a fair summary of these efforts would go 
well beyond the scope of the present contribution. In short, there are mainly 

1 http:/ /www. cso.org/public/tcles-instr/surveytclescopes/vista/surveys. html 

2 http:/ /www. strw.leidenuniv.nl/~kuijken/KIDS/ 

3 http://www.darkenergysurvey.org/ 

4 http:/ /sci.esa.int/science-e/www/area/index.cfm?fareaid=102 
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two approaches, one based on empirical methods, the other on template fit- 
ting. In the first case one tries to parametrize the low-dimensional surface in 
color-redshift space that galaxies occupy using low-order polynomials, nearest- 
neighbor searches or neural networks [3,4. These codes extract the information 
directly from the data, given an appropriate training set with spectroscopic in- 
formation. Template fitting methods work instead with a set of model spectra 
from observed galaxies and stellar population models [HliniEUH] - 

The PhotoZ code that we have implemented under Astro- WISE belongs 
to the second category and its original incarnation is described in [S]. The 
code estimates redshifts z by comparing a set of discrete template SEDs T to 
the broadband photometry of the (redshifted) galaxies. For each SED the full 
redshift likelihood function including priors for redshift, absolute luminosity 
and SED probability is computed using the Bayes' theorem: 



where C is the vector of measured colors, M the galaxy absolute magnitude, 
p(C\z,T) oc exp{— x 2 /2) is the probability of obtaining a normalized x 2 for 
the given dataset with its errors, redshift and template T, and p(z,T\M) the 
prior distribution. This is a product of parametrized functions of the type: 



where the variable y stands for redshift or absolute magnitudes. Typically 
we use n = 0, p = 6 or 8, and y and a y with appropriate values for mean 
redshifts and ranges, or mean absolute magnitudes and ranges, which depend 
on the SED type. The set of galaxy templates is semi-empirical and is chosen 
to map the color space spanned by the different types of objects at different 
redshifts. The original set [5] includes 31 SEDs describing a broad range of 
galaxy spectral types, from early to late to star-bursting objects. Recently, 
we added a set of SEDs tailored to fit luminous red galaxies and one SED to 
represent the average QSO spectrum at redshift « 2 [TU]. Furthermore, the 
method also fits a set of stellar templates, allowing a star/galaxy classification 
and an estimate of the line-of-sight extinction for stellar objects. The templates 
cover typically the wavelength range A = 900 A up to 25000 A (with the 
QSO template covering instead 300-8000 A) and are sampled with a step 
typically 10 A wide (varying from 5 to 20 A; the QSO SED has 4A = 1 A). 
The method has been extensively tested and applied to several photometric 
catalogues with spectroscopic follow-ups. Given a (deep) photometric dataset 
covering the wavelength range from the U to the K band, excellent photometric 
redshifts with Sz/(1 + z) ~ 0.03 up to z w 5 with at most a few percent 
catastrophic failures can be derived for every SED type ([H], [12], [E]). When 
applied to the 5 filter band catalogs ugriz of SDSS [H] or grizy of PanSTARRS 
[TU] . the code delivers 8z/(l + z) ~ 0.02 for luminous red galaxies up to redshift 
« 0.5. A more detailed description of the scientific merits of PhotoZ goes 
beyond the scope of this paper, see |15] to compare these performances to the 



P(z, T\C, M, ...) cx p(C\z, T)p(z, T\M), 



(1) 
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ones achieved by other packages. The code is available in Fortran and C++ 
versions. 

3 Implementation for large datasets 

The science applications described in the previous section dealt with some 
thousand objects and could be managed by simple means, i.e. ascii-based cat- 
alogues. In the era of all-sky surveys and/or very deep fields, where millions, if 
not billions of objects are imaged, this approach is doomed to fail. The support 
of a database, the automatisation of the procedures and the tools to adminis- 
trate the testing and analysis of the results become essential ingredients for a 
successful science project. Therefore, having in mind our participation in the 
PanSTARRSl survey and future projects like KIDS, DES and possibly EU- 
CLID (see Introduction), we designed and implemented two packages, PhotoZ 
for Astro- WISE (see Sect. 13.11 and I3.2[) and the Photometric Classification 
Server (PCS) for PanSTARRSl (see Sect-O- 

3.1 PhotoZ for Astro- WISE 

We embedded the PhotoZ code in Astro- WISE following the general philoso- 
phy of the package. A Python wrapper (PhotRedCatalog) interfaces the Oracle 
Astro- WISE database to the (Fortran) code, providing the necessary reading, 
executing and writing calls to construct the ascii input files with the photom- 
etry vectors, call the (compiled Fortran) PhotoZ code and transfer the ascii 
output back into the database. As usual in every Astro- WISE application, 
each persistent entry created in this last phase allows the backward tracing of 
the components down to the single raw and calibration frames that went into 
the production of the photometry used in the process. The option for a poste- 
riori evaluation of the full redshift probability distribution for a list of selected 
objects is provided. A separate routine (PhotRedConfig) allows the folding of 
the available spectral energy distributions with the given filter curves on a 
predefined grid of redshifts to maximize the speed of the PhotoZ code for a 
given photometric set. Parallelization is obtained by splitting the list of ob- 
jects to be analysed in smaller chunks, and executing separate calls of PhotoZ 
on the multiple cluster nodes. Visualization routines give the possibility to 
plot the best-fitting SED, the best-fitting stellar SED, the datapoints and the 
redshift probability distribution of selected objects. A schematic description 
of the structure of the PhotoZ code is given in Fig. [T] 

3.2 Examples and Performances 

PhotoZ runs under Astro- WISE as implemented at the Munich node on the 
PanSTARRS cluster, a 175 nodes (each with 2.6GHz 4 CPUs and 6 GB mem- 
ory, for a total of 700 CPUs) Beowulf machine with 180 TB disk space, attached 
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Fig. 1 Flowchart of the basic functionality of PhotoZ. Top part: SEDs, the stellar library 
and the filter curves are retrieved from the data server, the SEDs multiplied with the filter 
curves to compute the relative fluxes in each band. The results are then again stored on the 
data servers. Lower part: To create a PhotRedCatalog object, the system retrieves the neces- 
sary files, creates an AssociateList of the input SourceLists (i.e. matches the lists in RA and 
DEC), computes the photometric redshifts, using the information from the PhotRedConfig 
and finally links the resulting PhotoZ SourceList with the AssociateList. 



to a PB robotic storing device, mounted at the Max-Planck Rechenzentrum 
in Garching. Two servers run the Oracle database. The Munich Astro- WISE 
node is federated with the central node of Groningen. 

As an example how the system works, we describe the derivation of the pho- 
tometric redshifts of galaxies detected in the Medium Deep Field 4 (MDF04) 
of PanSTARRSl (see also and [TU]) in an Astro- WISE session. In this con- 
text below we indicate with "awe>" the Python Astro- WISE prompt. For a 
detailed description how to run the commands discussed below we refer to the 
Astro- WISE manual 0. 

We first ingest the PanSTARRSl filter curves: 

awe> photredfiltcr = PhotRedFilter( pathname='PS_g. filter') 
awe> photredfilter.filter=(Filter.magJd=='PS_g g') 
awe> ... 

awe> photredfiltcr. make () 

where PS_g. filter is an ASCII file with two columns, wavelength in Angstroms 
and the transmission of the PanSTARRSl g filter at this wavelength. We 
repeat the process for the filters r, i, z and y. Then we configure the system, 
specifying the galaxy and stellar libraries (see [5]): 

awe> filt = (Filtcr.namc == TS_g')[0] 
awe> pfg = (PhotRedFilter.filtcr == filt )[0] 
awe> ... 

awe> pse = (PhotRedSED. sed_namc == 'mod_e.sed')[0] 
awe> psl = (PhotRedSED. sed_namc == 'mod_s210.sed')[0] 
awe> ... 



http: / /www. astro- wise.org/portal/aw_howtos.shtml 
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awe> starlib=(PhotRedStarlib.filenanie== , starlib_pickles.lis')[0] 

awe> pc = PhotRedConfigQ 

awe> pc.SEDs=[pse,psl,...] 

awe> pc.filters=[pfg,pfr,pfi,pfz,pfy] 

awe> pc.starlib=(starlib)[0] 

awe> pc.name='PanSTARRSl_MDF04' 

awe> pc.makcQ 

We now ingest the PanSTARRSl photometric catalogue into the SourceLists 
sg, sr, si, sz, sy and generate the photometric redshifts with the commands: 

awe> pr = PhotRedCatalog() 

awe> pr.config=pc 

awe> pr.master=sg 

awe > pr . sour celist s = [sg , sr , si , sz , sy] 

awe> pr.namc='PanSTARRSl_MDF04' 

awe> pr.make() 

The results are stored in the pr.associateJist AssociateList and can be exam- 
ined through the Oracle database tools and/or the Python awe> prompt. For 
example, the command 

awe> pr.plot( 23 ) 

plots the best-fitting SED, the best- fitting stellar SED, the datapoints and the 
redshift probability distribution for the objects with identification number 23 
in the associatedist. 

The derivation of photometric redshifts for the « 350000 entries in the 
MDF04 photometric catalogue down to the r — 24 magnitude takes 3.3 sec 
if our full PanSTARRS cluster (700 CPUs) is available (i.e. « 150 objects 
per second per node). Through the federation mechanism the results can be 
seen from each Astro- WISE federated node that has the relevant permissions 
to access the data. We are in the process of optimizing the SEDs and vali- 
dating the photometric redshifts through available spectroscopic data for the 
PanSTARRSl filter set. First results are discussed in [10] . where a precision 
of k 0.02(1 + z) for red luminous galaxies up to redshift z rj 0.5 is achieved. 

3.3 The Photometric Classification Server for PanSTARRSl 

The Photometric Classification Server (PCS) for PanSTARRSl provides soft- 
ware tools to perform a photometric star/QSO/galaxy classification, compute 
photometric redshifts for galaxies and (a subset of) best-fitting temperature, 
metallicity, gravity and interstellar extinction parameters for stars. A detailed 
description of the system can be found in [16|, [17] and [10]. The code is 
interfaced to the Published Science Products System (PSPS) database of 
PanSTARRSl (see [TB]), based on Microsoft SQL and inspired in its struc- 
ture by the SDSS database. The "manual mode" of operations is similar to 
the one described in Sect. 13.11 The user can query the database and run the 
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code off-line through SQL commands and calls to shell scripts, or also through 
a web interface. This mode is useful when optimizing the SEDs using available 
datasets with spectroscopic redshifts. The normal mode of operation, however, 
is fully automatized. The interface to PSPS triggers the analysis of newly pro- 
duced object catalogues of the static sky on a regular basis. When new entries 
are found, CAS-like jobs extract them from the database in Hawaii, format 
them on our PanSTARRS cluster and submit multiple runs of the 0++ ver- 
sion of PhotoZ with sub-blocks of data to parallelize the processing. Finally, 
the resulting photometric redshifts are stored in a local MySQL database and 
the corresponding table is pulled by the central one in Hawaii. The process 
to download from the PSPS database the MDF04 catalogue (« 2 minutes), 
measure the photometric redshifts on our PanSTARRS cluster (w 30 sec if 700 
nodes are available) and provide the results for pulling by the PSPS database 
(« 2 minutes) takes at most 5 minutes. Therefore we expect to sustain the 
expected regular flow of new photometric data of PanSTARRS 1 without prob- 
lems. The system is open for the implementation of further different approaches 
to photometric redshifts (see Sect. [2]). 

4 Conclusions 

PhotoZ under Astro- WISE and PCS for PanSTARRS 1, the systems to com- 
pute accurate photometric redshifts for large datasets described in this paper, 
are up and running. They are ready to analyse and archive the photometric 
catalogues with millions of entries that the wide area surveys started recently 
or starting in the near future will provide. They can be considered as pro- 
totypes for the future development of the data analysis schemes of EUCLID 

PS- 
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